Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-37027268

RESUMO

Existing methods in planar region segmentation suffer the problems of vague boundaries and failure to detect small-sized regions. To address these, this study presents an end-to-end framework, named PlaneSeg, which can be easily integrated into various plane segmentation models. Specifically, PlaneSeg contains three modules, namely, the edge feature extraction module, the multiscale module, and the resolution-adaptation module. First, the edge feature extraction module produces edge-aware feature maps for finer segmentation boundaries. The learned edge information acts as a constraint to mitigate inaccurate boundaries. Second, the multiscale module combines feature maps of different layers to harvest spatial and semantic information from planar objects. The multiformity of object information can help recognize small-sized objects to produce more accurate segmentation results. Third, the resolution-adaptation module fuses the feature maps produced by the two aforementioned modules. For this module, a pairwise feature fusion is adopted to resample the dropped pixels and extract more detailed features. Extensive experiments demonstrate that PlaneSeg outperforms other state-of-the-art approaches on three downstream tasks, including plane segmentation, 3-D plane reconstruction, and depth prediction. Code is available at https://github.com/nku-zhichengzhang/PlaneSeg.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8577-8593, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37015512

RESUMO

Image complexity (IC) is an essential visual perception for human beings to understand an image. However, explicitly evaluating the IC is challenging, and has long been overlooked since, on the one hand, the evaluation of IC is relatively subjective due to its dependence on human perception, and on the other hand, the IC is semantic-dependent while real-world images are diverse. To facilitate the research of IC assessment in this deep learning era, we built the first, to our best knowledge, large-scale IC dataset with 9,600 well-annotated images. The images are of diverse areas such as abstract, paintings and real-world scenes, each of which is elaborately annotated by 17 human contributors. Powered by this high-quality dataset, we further provide a base model to predict the IC scores and estimate the complexity density maps in a weakly supervised way. The model is verified to be effective, and correlates well with human perception (with the Pearson correlation coefficient being 0.949). Last but not the least, we have empirically validated that the exploration of IC can provide auxiliary information and boost the performance of a wide range of computer vision tasks. The dataset and source code can be found at https://github.com/tinglyfeng/IC9600.

3.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6729-6751, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34214034

RESUMO

Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges - the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.


Assuntos
Algoritmos , Emoções , Processamento de Imagem Assistida por Computador
4.
IEEE Trans Cybern ; 52(10): 10000-10013, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33760749

RESUMO

Thanks to large-scale labeled training data, deep neural networks (DNNs) have obtained remarkable success in many vision and multimedia tasks. However, because of the presence of domain shift, the learned knowledge of the well-trained DNNs cannot be well generalized to new domains or datasets that have few labels. Unsupervised domain adaptation (UDA) studies the problem of transferring models trained on one labeled source domain to another unlabeled target domain. In this article, we focus on UDA in visual emotion analysis for both emotion distribution learning and dominant emotion classification. Specifically, we design a novel end-to-end cycle-consistent adversarial model, called CycleEmotionGAN++. First, we generate an adapted domain to align the source and target domains on the pixel level by improving CycleGAN with a multiscale structured cycle-consistency loss. During the image translation, we propose a dynamic emotional semantic consistency loss to preserve the emotion labels of the source images. Second, we train a transferable task classifier on the adapted domain with feature-level alignment between the adapted and target domains. We conduct extensive UDA experiments on the Flickr-LDL and Twitter-LDL datasets for distribution learning and ArtPhoto and Flickr and Instagram datasets for emotion classification. The results demonstrate the significant improvements yielded by the proposed CycleEmotionGAN++ compared to state-of-the-art UDA approaches.


Assuntos
Redes Neurais de Computação , Semântica , Emoções , Humanos
5.
IEEE Trans Image Process ; 31: 1134-1148, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34932477

RESUMO

The success of deep convolutional networks (ConvNets) generally relies on a massive amount of well-labeled data, which is labor-intensive and time-consuming to collect and annotate in many scenarios. To eliminate such limitation, self-supervised learning (SSL) is recently proposed. Specifically, by solving a pre-designed proxy task, SSL is capable of capturing general-purpose features without requiring human supervision. Existing efforts focus obsessively on designing a particular proxy task but ignore the semanticity of samples that are advantageous to downstream tasks, resulting in the inherent limitation that the learned features are specific to the proxy task, namely the proxy task-specificity of features. In this work, to improve the generalizability of features learned by existing SSL methods, we present a novel self-supervised framework SSL++ to incorporate the proxy task-independent semanticity of samples into the representation learning process. Technically, SSL++ aims to leverage the complementarity, between the low-level generic features learned by a proxy task and the high-level semantic features newly learned by the generated semantic pseudo-labels, to mitigate the task-specificity and improve the generalizability of features. Extensive experiments show that SSL++ performs favorably against the state-of-the-art approaches on the established and latest SSL benchmarks.


Assuntos
Aprendizado de Máquina Supervisionado , Humanos
6.
IEEE Trans Image Process ; 30: 8727-8742, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34613915

RESUMO

Multi-level feature fusion is a fundamental topic in computer vision. It has been exploited to detect, segment and classify objects at various scales. When multi-level features meet multi-modal cues, the optimal feature aggregation and multi-modal learning strategy become a hot potato. In this paper, we leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel Bifurcated Backbone Strategy Network (BBS-Net). Our architecture, is simple, efficient, and backbone-independent. In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS). Second, we introduce a depth-enhanced module (DEM) to excavate informative depth cues from the channel and spatial views. Then, RGB and depth modalities are fused in a complementary way. Extensive experiments show that BBS-Net significantly outperforms 18 state-of-the-art (SOTA) models on eight challenging datasets under five evaluation measures, demonstrating the superiority of our approach (~4% improvement in S-measure vs . the top-ranked model: DMRA). In addition, we provide a comprehensive analysis on the generalization ability of different RGB-D datasets and provide a powerful training set for future research. The complete algorithm, benchmark results, and post-processing toolbox are publicly available at https://github.com/zyjwuyan/BBS-Net.

7.
IEEE Trans Image Process ; 30: 6512-6527, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34252026

RESUMO

Deep learning (DL) is inherently subject to the requirement of a large amount of well-labeled data, which is expensive and time-consuming to obtain manually. In order to broaden the reach of DL, leveraging free web data becomes an attractive strategy to alleviate the issue of data scarcity. However, directly utilizing collected web data to train a deep model is ineffective because of the mixed noisy data. To address such problems, we develop a novel bidirectional self-paced learning (BiSPL) framework which reduces the effect of noise by learning from web data in a meaningful order. Technically, the BiSPL framework consists of two essential steps. Relying on distances defined between web samples and labeled source samples, first, the web samples with short distances are sampled and combined to form a new training set. Second, based on the new training set, both easy and hard samples are initially employed to train deep models for higher stability, and hard samples are gradually dropped to reduce the noise as the training progresses. By iteratively alternating such steps, deep models converge to a better solution. We mainly focus on the fine-grained visual classification (FGVC) tasks because their corresponding datasets are generally small and therefore face a more significant data scarcity problem. Experiments conducted on six public FGVC tasks demonstrate that our proposed method outperforms the state-of-the-art approaches. Especially, BiSPL suffices to achieve the highest stable performance when the scale of the well-labeled training set decreases dramatically.

8.
IEEE Trans Image Process ; 30: 6730-6743, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34283714

RESUMO

A simultaneous understanding of questions and images is crucial in Visual Question Answering (VQA). While the existing models have achieved satisfactory performance by associating questions with key objects in images, the answers also contain rich information that can be used to describe the visual contents in images. In this paper, we propose a re-attention framework to utilize the information in answers for the VQA task. The framework first learns the initial attention weights for the objects by calculating the similarity of each word-object pair in the feature space. Then, the visual attention map is reconstructed by re-attending the objects in images based on the answer. Through keeping the initial visual attention map and the reconstructed one to be consistent, the learned visual attention map can be corrected by the answer information. Besides, we introduce a gate mechanism to automatically control the contribution of re-attention to model training based on the entropy of the learned initial visual attention maps. We conduct experiments on three benchmark datasets, and the results demonstrate the proposed model performs favorably against state-of-the-art methods.

9.
IEEE Trans Pattern Anal Mach Intell ; 42(6): 1537-1544, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31056488

RESUMO

Finding the informative subspaces of high-dimensional datasets is at the core of numerous applications in computer vision, where spectral-based subspace clustering is arguably the most widely studied method due to its strong empirical performance. Such algorithms first compute an affinity matrix to construct a self-representation for each sample using other samples as a dictionary. Sparsity and connectivity of the self-representation play important roles in effective subspace clustering. However, simultaneous optimization of both factors is difficult due to their conflicting nature, and most existing methods are designed to address only one factor. In this paper, we propose a post-processing technique to optimize both sparsity and connectivity by finding good neighbors. Good neighbors induce key connections among samples within a subspace and not only have large affinity coefficients but are also strongly connected to each other. We reassign the coefficients of the good neighbors and eliminate other entries to generate a new coefficient matrix. We show that the few good neighbors can effectively recover the subspace, and the proposed post-processing step of finding good neighbors is complementary to most existing subspace clustering algorithms. Experiments on five benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods with negligible additional computation cost.

10.
IEEE Trans Neural Netw Learn Syst ; 31(8): 2832-2846, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-31199274

RESUMO

Class imbalance is a challenging problem in many classification tasks. It induces biased classification results for minority classes that contain less training samples than others. Most existing approaches aim to remedy the imbalanced number of instances among categories by resampling the majority and minority classes accordingly. However, the imbalanced level of difficulty of recognizing different categories is also crucial, especially for distinguishing samples with many classes. For example, in the task of clinical skin disease recognition, several rare diseases have a small number of training samples, but they are easy to diagnose because of their distinct visual properties. On the other hand, some common skin diseases, e.g., eczema, are hard to recognize due to the lack of special symptoms. To address this problem, we propose a self-paced balance learning (SPBL) algorithm in this paper. Specifically, we introduce a comprehensive metric termed the complexity of image category that is a combination of both sample number and recognition difficulty. First, the complexity is initialized using the model of the first pace, where the pace indicates one iteration in the self-paced learning paradigm. We then assign each class a penalty weight that is larger for more complex categories and smaller for easier ones, after which the curriculum is reconstructed by rearranging the training samples. Consequently, the model can iteratively learn discriminative representations via balancing the complexity in each pace. Experimental results on the SD-198 and SD-260 benchmark data sets demonstrate that the proposed SPBL algorithm performs favorably against the state-of-the-art methods. We also demonstrate the effectiveness of the SPBL algorithm's generalization capacity on various tasks, such as indoor scene image recognition and object classification.


Assuntos
Algoritmos , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Dermatopatias/diagnóstico , Bases de Dados Factuais/estatística & dados numéricos , Humanos , Reconhecimento Automatizado de Padrão/estatística & dados numéricos
11.
IEEE Trans Image Process ; 28(8): 3973-3985, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-30843836

RESUMO

In this paper, we propose a unified framework to discover the number of clusters and group the data points into different clusters using subspace clustering simultaneously. Real data distributed in a high-dimensional space can be disentangled into a union of low-dimensional subspaces, which can benefit various applications. To explore such intrinsic structure, state-of-the-art subspace clustering approaches often optimize a self-representation problem among all samples, to construct a pairwise affinity graph for spectral clustering. However, a graph with pairwise similarities lacks robustness for segmentation, especially for samples which lie on the intersection of two subspaces. To address this problem, we design a hyper-correlation-based data structure termed as the triplet relationship, which reveals high relevance and local compactness among three samples. The triplet relationship can be derived from the self-representation matrix, and be utilized to iteratively assign the data points to clusters. Based on the triplet relationship, we propose a unified optimizing scheme to automatically calculate clustering assignments. Specifically, we optimize a model selection reward and a fusion reward by simultaneously maximizing the similarity of triplets from different clusters while minimizing the correlation of triplets from the same cluster. The proposed algorithm also automatically reveals the number of clusters and fuses groups to avoid over-segmentation. Extensive experimental results on both synthetic and real-world datasets validate the effectiveness and robustness of the proposed method.

12.
IEEE Trans Image Process ; 27(11): 5303-5315, 2018 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-30010575

RESUMO

Leveraging the abundant number of web data is a promising strategy in addressing the problem of data lacking when training convolutional neural networks (CNNs). However, the web images often contain incorrect tags, which may compromise the learned CNN model. To address this problem, this paper focuses on image classification and proposes to iterate between filtering out noisy web labels and fine-tuning the CNN model using the crawled web images. Overall, the proposed method benefits from the growing modeling capability of the learned model to correct labels for web images and learning from such new data to produce a more effective model. Our contribution is two-fold. First, we propose an iterative method that progressively improves the discriminative ability of CNNs and the accuracy of web image selection. This method is beneficial toward selecting high-quality web training images and expanding the training set as the model gets ameliorated. Second, since web images are usually complex and may not be accurately described by a single tag, we propose to assign a web image multiple labels to reduce the impact of hard label assignment. This labeling strategy mines more training samples to improve the CNN model. In the experiments, we crawl 0.5 million web images covering all categories of four public image classification data sets. Compared with the baseline which has no web images for training, we show that the proposed method brings notable improvement. We also report the competitive recognition accuracy compared with the state of the art.

13.
IEEE Trans Image Process ; 27(11): 5288-5302, 2018 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-29994213

RESUMO

For image retrieval methods based on bag of visual words, much attention has been paid to enhancing the discriminative powers of the local features. Although retrieved images are usually similar to a query in minutiae, they may be significantly different from a semantic perspective, which can be effectively distinguished by convolutional neural networks (CNN). Such images should not be considered as relevant pairs. To tackle this problem, we propose to construct a dynamic match kernel by adaptively calculating the matching thresholds between query and candidate images based on the pairwise distance among deep CNN features. In contrast to the typical static match kernel which is independent to the global appearance of retrieved images, the dynamic one leverages the semantical similarity as a constraint for determining the matches. Accordingly, we propose a semantic-constrained retrieval framework by incorporating the dynamic match kernel, which focuses on matched patches between relevant images and filters out the ones for irrelevant pairs. Furthermore, we demonstrate that the proposed kernel complements recent methods, such as hamming embedding, multiple assignment, local descriptors aggregation, and graph-based re-ranking, while it outperforms the static one under various settings on off-the-shelf evaluation metrics. We also propose to evaluate the matched patches both quantitatively and qualitatively. Extensive experiments on five benchmark data sets and large-scale distractors validate the merits of the proposed method against the state-of-the-art methods for image retrieval.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...